Skip to content

Conversation

jhedberg
Copy link
Member

@jhedberg jhedberg commented Oct 8, 2025

The BT_AUTO_PHY_UPDATE option was problematic in the sense that it didn't
really allow any kind of fine-tuning of what automation is desired. In
particular, it didn't allow specifying which PHY was the preferred one (it
was hardcoded to 2M) and also didn't allow specifying role-specific
(central vs peripheral) preferences.

To solve this, deprecate the old option, and instead introduce
role-specific options that allow specifying 1M/2M/Coded/None as the
preference. The new options have slightly different defaults than the new:
for central role 2M stays as the preference, however since it's uncommon to
require automated changes on the peripheral side the default is set to
None there.

@jhedberg jhedberg added the Bluetooth Review Discussion in the Bluetooth WG meeting required label Oct 9, 2025
@jhedberg
Copy link
Member Author

jhedberg commented Oct 9, 2025

Posting here the latest from Discord:

There shouldn't be a need for any of this automation in the host, neither for the PHY nor for the data length, due to the existence of HCI_LE_Set_Default_PHY and HCI_LE_Write_Suggested_Default_Data_Length commands that could either be included as part of the host's HCI init (taking their values from Kconfig), or from the application, for which we already have existing APIs.

Actually, we already have HCI_LE_Write_Suggested_Default_Data_Length in the HCI init, so the code in conn.c is (as the code comment there indicates) only needed for buggy controllers that don't otherwise take care of this. Either way, the Kconfig options for these become questionable.

The only place where the CONFIG_BT_USER_DATA_LEN_UPDATE and CONFIG_BT_USER_PHY_UPDATE options are slightly useful are for memory saving in the bt_conn_cb struct. However, they're really not related to triggers from the app side, rather a peripheral app may be interested in these e.g. if the central goes and changes the values during a connection.

@jhedberg
Copy link
Member Author

jhedberg commented Oct 9, 2025

Conclusion from the Bluetooth WG meeting:

The set_default HCI commands don't actually cause any procedures to be triggered in themselves, so the explicit triggering by the host still has a purpose. I'll therefore roll back to my earlier plan to just do the role split for the phy update. For data length, we can either remove or deprecate the "auto" Kconfig option, since it serves no purpose now that the controller "no_auto_dle" quirk comes from DTS, i.e. is build-time resolvable in itself.

Another improvement that can be done (follow-up PR) is to cache the controller max data length, since we read it already during HCI init, i.e. no need to redo this for every connection.

@cvinayak
Copy link
Contributor

cvinayak commented Oct 9, 2025

Just re-iterating my statements from the WG meeting.

just do the role split for the phy update.

Agree. And keep the feature available (default disabled is ok) for users needing auto PHY update being handled by Host on connection establishment (i.e. while connected callback is generated to applications). We do not want such users investing in developing a solution themself when migrating to new Zephyr release.

For data length, we can either remove or deprecate the "auto" Kconfig option, since it serves no purpose now that the controller "no_auto_dle" quirk comes from DTS, i.e. is build-time resolvable in itself.

Agree.

Another improvement that can be done (follow-up PR) is to cache the controller max data length, since we read it already during HCI init, i.e. no need to redo this for every connection.

Agree.

@alwa-nordic
Copy link
Contributor

alwa-nordic commented Oct 9, 2025

Another improvement that can be done (follow-up PR) is to cache the controller max data length, since we read it already during HCI init, i.e. no need to redo this for every connection.

HCI spec does not prohibit asking for a larger value than the Controller supports. We can always do bt_le_set_data_len(conn, 0x00fb, 0x4290), which are the HCI-defined absolute limits.

@Thalley
Copy link
Contributor

Thalley commented Oct 9, 2025

HCI spec does not prohibit asking for a larger value than the Controller supports. We can always do bt_le_set_data_len(conn, 0x00fb, 0x4290), which are the HCI-defined absolute limits.

From 7.8.33 LE Set Data Length command (similar text exists for 7.8.35 LE Write Suggested Default Data Length command):

This command allows the Host to suggest the maximum transmission payload size and
maximum packet transmission time (connMaxTxOctets and connMaxTxTime - see [Vol
6] Part B, Section 4.5.10) to be used for LL Data PDUs on a given connection. The
Controller may use smaller or larger values based on local information
.

I think that make might sense to do. We probably want to use the largest data length, and since the host isn't using the results of bt_hci_le_read_max_data_len for anything anyways, then it's kind of moot to perform that (not to mention that even setting the data length to the maximum value supported by the controller, doesn't ensure that it is indeed the value that the controller supports.
Of course as @jhedberg mentioned, some controllers may have some odd behavior (e.g. could just ignore the value from the host if it's outside it's range, instead of applying the maximum it supports).

@jhedberg jhedberg changed the title Bluetooth: Host: Limit auto-phy update to central by default Bluetooth: Host: Add role-specific auto PHY update options Oct 10, 2025
@jhedberg
Copy link
Member Author

@Thalley @cvinayak I've pushed an update to implement the initial change as discussed

Comment on lines 1814 to 1836
switch (conn->role) {
#if defined(AUTO_PHY_CENTRAL)
case BT_HCI_ROLE_CENTRAL:
if (AUTO_PHY_CENTRAL_SUPPORTED(bt_dev.le.features) &&
phy_change_needed(conn, AUTO_PHY_CENTRAL)) {
err = bt_le_set_phy(conn, 0U, AUTO_PHY_CENTRAL_PREF,
AUTO_PHY_CENTRAL_PREF, BT_HCI_LE_PHY_CODED_ANY);
}
if (conn->state != BT_CONN_CONNECTED) {
return;
break;
#endif
#if defined(AUTO_PHY_PERIPHERAL)
case BT_HCI_ROLE_PERIPHERAL:
if (AUTO_PHY_PERIPHERAL_SUPPORTED(bt_dev.le.features) &&
phy_change_needed(conn, AUTO_PHY_PERIPHERAL)) {
err = bt_le_set_phy(conn, 0U, AUTO_PHY_PERIPHERAL_PREF,
AUTO_PHY_PERIPHERAL_PREF, BT_HCI_LE_PHY_CODED_ANY);
}
break;
#endif
default:
/* No PHY preferences set for the given role */
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that phy_change_needed takes conn, the conn->role check could be moved to that function which would make the code here significantly simpler

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for finding ways to simplify this. However, I'm not sure how that exact proposal helps, since we also need the role to know what to pass to the bt_le_set_phy() call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we also need the role to know what to pass to the bt_le_set_phy() call.

Since the value is exclusively based on the conn->role, then we don't need to pass it along at all. The function can just take the conn pointer and nothing more :)

*/
static bool uses_symmetric_2mbit_phy(struct bt_conn *conn)
#if defined(AUTO_PHY_CENTRAL) || defined(AUTO_PHY_PERIPHERAL)
static bool phy_change_needed(struct bt_conn *conn, uint8_t phy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: "Needed" is a bit strongly worded here; we never really need to update the PHY, do we?

(Not a blocker, and the name is OK, but otherwise consider can_update_phy or something instead?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure "can" is any more intuitive here. This is whether we already have the PHY we prefer. If we do, then we don't "need" to do the procedure. What other word would be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we end up moving everything related to the PHY update into the function (see #97198 (comment)), it would probably just be do_auto_phy_update :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can certainly refactor this so that everything is in a single PHY-related function. It might have a slight impact on code size, but likely not much (if any)

Comment on lines 424 to 425
config BT_AUTO_PHY_PERIPHERAL_NONE
bool "No PHY preference"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider putting "None" as the first choice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, the _NONE options shouldn't really be needed since Kconfig has the optional keyword for choice statements which make it possible to not have any of the possible options selected. The problem with optional (as I discovered) is that it then doesn't allow us to have a default where one of the options would anyway be selected. The only effect of a default for optional is that when you select the choice on the top-level in menuconfig, then the default is what'll be initially enabled, however the top-level doesn't implicitly get enabled by specifying a default.

There's a regression from commit 8cfad44
which introduced trying to enable advertising twice. Remove the other
attempt.

Signed-off-by: Johan Hedberg <[email protected]>
The BT_AUTO_PHY_UPDATE option was problematic in the sense that it didn't
really allow any kind of fine-tuning of what automation is desired. In
particular, it didn't allow specifying which PHY was the preferred one (it
was hardcoded to 2M) and also didn't allow specifying role-specific
(central vs peripheral) preferences.

To solve this, deprecate the old option, and instead introduce
role-specific options that allow specifying 1M/2M/Coded/None as the
preference. The new options have slightly different defaults than the new:
for central role 2M stays as the preference, however since it's uncommon to
require automated changes on the peripheral side the default is set to
None there.

Signed-off-by: Johan Hedberg <[email protected]>
Document the changes related to Bluetooth auto PHY update Kconfig options.

Signed-off-by: Johan Hedberg <[email protected]>
Remove deprecated Kconfig option usage, and replace it with corresponding
options which yield the same behavior as before.

Signed-off-by: Johan Hedberg <[email protected]>
Make sure to consider both the new peripheral and central Kconfig options
when calculating needed TX buffers.

Signed-off-by: Johan Hedberg <[email protected]>
@jhedberg
Copy link
Member Author

@Thalley @cvinayak There was a test case bug that @aescolar helped to uncover. See the first commit in this PR for the details. Fingers crossed that everything will be green now.

Thalley
Thalley previously approved these changes Oct 13, 2025
@aescolar
Copy link
Member

We can probably also find a way to convert the values to strings using strerror in some way

I'd say that would be nicer. Otherwise the next thing would be someone looking at one libc errno.h or another and getting confused. Just faster to see the error name printed.

@jhedberg
Copy link
Member Author

@Thalley @cvinayak There was a test case bug that @aescolar helped to uncover. See the first commit in this PR for the details. Fingers crossed that everything will be green now.

Turns out this was a red herring :( It was a bug, but not the cause of the test case failure. Only other idea I have is to 100% restore the PHY config for this test case, in case it depends on both sides to have "auto-phy 2M" enabled.

@jhedberg
Copy link
Member Author

Now we have

d_00: @00:00:07.159852 ERROR: (WEST_TOPDIR/zephyr/tests/bsim/bluetooth/audio/src/bap_scan_delegator_test.c:141): PA timeout

@jhedberg jhedberg closed this Oct 13, 2025
@github-project-automation github-project-automation bot moved this from In Review to Done in Bluetooth LE Audio Oct 13, 2025
@jhedberg jhedberg reopened this Oct 13, 2025
@Thalley
Copy link
Contributor

Thalley commented Oct 13, 2025

Now we have

d_00: @00:00:07.159852 ERROR: (WEST_TOPDIR/zephyr/tests/bsim/bluetooth/audio/src/bap_scan_delegator_test.c:141): PA timeout

I also get that for #91587.
When the timing changes, the scheduling changes, and sometimes that causes issues like this.

Sometimes it helps nudging the -start_offset and -rs values in the test script, but it's pretty hard to know what values to set (I managed to get the above to pass locally, but fail in CI). Maybe @cvinayak can help provide some reasonable values there?

@aescolar
Copy link
Member

(I managed to get the above to pass locally, but fail in CI)

@Thalley that should not be happening unless there is unitialized memory in play (or the test is killed in CI due to taking too long).

@Thalley
Copy link
Contributor

Thalley commented Oct 13, 2025

(I managed to get the above to pass locally, but fail in CI)

@Thalley that should not be happening unless there is unitialized memory in play (or the test is killed in CI due to taking too long).

I've seen that happen on many occasions over the years, and I've never been able to find any memory issues with valgrind, ASAN or UBSAN :)

@jhedberg
Copy link
Member Author

@Thalley why hasn't the test been at least disabled, if a proper fix is not known? Blocking other PRs is justification enough to treat fixing or disabling it as a hotfix.

I can disable as part of this PR, or would you like to do it separately?

@cvinayak
Copy link
Contributor

@Thalley why hasn't the test been at least disabled, if a proper fix is not known? Blocking other PRs is justification enough to treat fixing or disabling it as a hotfix.

I can disable as part of this PR, or would you like to do it separately?

This PR changes the default timeline of many tests, causing scheduling overlaps or change in connection event length. Restoring back the use of PHY update for the failing tests should make it pass, have you tried that?

@jhedberg
Copy link
Member Author

This PR changes the default timeline of many tests, causing scheduling overlaps or change in connection event length. Restoring back the use of PHY update for the failing tests should make it pass, have you tried that?

I haven't, since @Thalley pointed out that the same test is also failing for #91587 (which doesn't change any timings, AFAIK)

@Thalley
Copy link
Contributor

Thalley commented Oct 14, 2025

@Thalley why hasn't the test been at least disabled, if a proper fix is not known? Blocking other PRs is justification enough to treat fixing or disabling it as a hotfix.

I can disable as part of this PR, or would you like to do it separately?

Because it's not specific to this test. This behavior/issue can somewhat randomly happen to any test (and especially any LE Audio test, given that we are often doing quite a lot on air in those tests with multiple devices).

I haven't, since @Thalley pointed out that the same test is also failing for #91587 (which doesn't change any timings, AFAIK)

The PR here does change the timing, since it removes a few steps in a function.

The real problem is, IMO, the scheduling overlaps that isn't being handled well. In a real-world device, it would/should probably do a retry if it fails to sync to the PA or any similar issue (often triggered by a user interaction), but that isn't the case with the tests today.

Restore the auto-phy config for tests which seem particularly sensitive
to timings related to this.

Signed-off-by: Johan Hedberg <[email protected]>
@jhedberg
Copy link
Member Author

Just pushed an update not to modify the auto-phy stuff for the audio bsim tests.

Copy link

@jhedberg
Copy link
Member Author

@Thalley @cvinayak @aescolar CI is all green now. Thanks for your help with debugging this!

Comment on lines -154 to -161
err = bt_le_adv_start(BT_LE_ADV_CONN_FAST_1, ad, ARRAY_SIZE(ad), NULL, 0);
if (err) {
FAIL("Advertising failed to start (err %d)\n", err);
return;
}

err = start_advertising();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar change but retains FAIL in the test in this PR: #97479

Copy link
Member Author

@jhedberg jhedberg Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test code seems to use printk() and FAIL() kind of interchangeably. E.g. don't you get duplicated output with your PR since start_advertising() has an internal printk() for the failure case?

@github-project-automation github-project-automation bot moved this from Done to In Review in Bluetooth LE Audio Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Bluetooth Audio area: Bluetooth Classic Bluetooth Classic (BR/EDR) area: Bluetooth Controller area: Bluetooth Host Bluetooth Host (excluding BR/EDR) area: Bluetooth area: Samples Samples area: Tests Issues related to a particular existing or missing test Bluetooth Review Discussion in the Bluetooth WG meeting required Release Notes To be mentioned in the release notes

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

6 participants